DOCUaCNT BBSOflB 



BD 167 >i(9 

i 

AUXfiOH 
TITLE . . . 



PUB OATt 
NOTt 



•EDHS raicii 

DESCiiiPrORS 



IDENTIFI EKS 



TH BOO 25{l 



violuo-Sflii tii, MarLa / . • 

Iha Application ot Rasch Modftl Equatinq Techni«jufes to 

chte ProbiwB or interpreting Longitudinal P,erformance 

on ninioiam coiupet«^ncy» Tests. 

Apr 60 

14 7p,; Paper presented at the Annual Iteetinq of tH'e 
American Educational Research Association (6<4th, 
Boston^ MA, April 7-11, 1^80), 

MFOI/PCO^ P^u/ Postaqei 

Exeaentary Seconaary Educatibn": *Goodnes5 of Fit; 
natheaaVics ; »f!inia»um Competency Testing; Beading 
Achievemeat ♦ 

Linear Modexs; New Jersey; New Jersey Minimum Basic 
5Kill$ Program; ♦Easch Model; *«Test Equating 



ABSTRACT ' ' . ■ 

Tais Stuay pc'esents an application of the Ea«ch 
equvating methoaoxogy on a minimum competency testing program in- 
reading and matheaitics* Common item equating was performed in two 
stages to ImK the 197d*lV7y rorm of* the New Jersey Minimum Basic 
Skills tests to^^he 1977-1979 form. Both fit to the pflTsch model and 
stability of the common item pools ^were investigated prt^r to the 
, actuax equating. Equivalent. law scores derived from the Rasch 
metjhodoiogy werp compared with a prior linear equating. Special 
attention -was ^id to those raw' score points arouri'd the. state's 
cut-off score. Results frop tne study indicated moderate, to good fit 
of the tests to the Rascu model. Several "instable" equating items 
were found *nd tne equating was therefor«5 carried out in two ways; 1) 
using all twenty-five equating .items, and, 2) using just the "stable" * 
equating items. Haw' scores derived from both the Rasch and linear 
methpds showed close though "not perfect agreement. The equating using 
only the "stable" equating items changed six out of the seven 
equating table:^- In only two bf^these six tables, did this change 
move the scores closer to chat given by the line-ar melihod. 
(Author/GSK) 



* Reproductions supplied by fiORS are the best that can be ndb^de ♦ 

* from the original document'. ♦ 
♦♦♦»f<|^«%««*«« ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦ ♦♦^♦^^^^^ ^4,^^ 1,,^^^^^^^^^^^^^^^^^ 



Kit D«rAl|TMlNTOr HtAiTH. 
tOOCA'riON AWftPAIII 
'YiATlONAt iNtTITUTl Q» 
■ DUCMTIOH 



0UC«O fXACTlY A\ RICItViD F.HOM 
IMf •••H^ON 0« OUGANl/AIIONptHOlN. 
AllNOn POlNnOf VliW on OFtNIOHS 
^ATtOCX)NOY NtCt^^AHllY 
SiNT Of FICIAL NATIONAI. iN^TlTUTf OF 
r DlK AT»0»##OSlT»ON on f»Ol »CY * 



the application of rasch model equating techniques 

TO the PROBLEtt of INTERPRETING LONGITUDINAL 
PERFORMANCg-ON MINIMUM COMPETE 




<■ 



MARNA GOLUi-SMITH 
NEW JERSEY STATE DEPARTMENT OF^ EDUCATION 



■PERMISSION TO REPRODUCE THIS 
H.t5VTERIAL HAS BEEN QRAWTEO BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



5^ 




Pa0«r presented at the annua l.itieeting of the Ahitrlcan' Educational Research 
Association. Boston,' Massachusetts, April 1980. s ' ' 



THE APPLICATION OF RASCH MODEL ^EQUATING TECHNIQUES 
' . TO THE PROBLEM OF INTERPRETING LONGITUDINAL 
IfiRFORMANCE ON MINIMUM COMPETENCY TESTS 



^ . The N«w J«r3ey MinOnum Basic Skills (MBS) program is a 
minimum comueten^y t€8t-ing program £n reading and mathematics. 
Each spring botli teats -ar* administered to Atudents ^n grades" 
three, six. nine and eleven. Beginning with the I977-197f 
school year, the New Jersiey Stkt^ Board of Education addp^ted 
.Adores of 75 parent ^and 65 percent correct, as passj[.ng or 
"cut-off" scores 'vn ,the reading and mathematics tfests, respectively. 

Because the law in New Jersey mandates the release of items 
after each test administr'atioh , new fortes of the i tests are developed 
each year. Although the test development process tries to' insure - 

(comparability of litems from one year to the* next , present test 

' '..\ ' 

construction techniques do not ^iiarantee that two or more forms 
of .'a test--developed ftom the ^d.ame' s^it oF st>ecifitation8-- will 
be perfectly equivalent. In' order then, to insure- that t;he- level 
of Achievement required of students ,'' as defined by thq state 
cut-off siore, would'^be the same on. subsequent forms of the tests, 
it was decided to etijuate each* new f orm^ to the previous^ one and 

\ * \ ' 

hence ito the origii^al scale 4«fin«<i the 1977-1978 form. Only « 
with equated forms can dne evaluate, changes in student performancfe 
from one yeaV to t>he next., * , 



TgST DEVELQPMEttr PROCESS - . ' * 

In order to understand the "proqedures Used to equate the » ^ 

• ^ * 

annual forms of the Minimum Basti? Skills tests, it is necessary- 



* n 



Co have an ovarvitw of tha tasti davalopman^ proce'ss. Tha( 



•valoproant of a naw form from itam salaction to f in«l admlnia- 

/tratlon generally , ancompaasaa about nine months. 

During early summer a statewide committee of* reading and 

mathematics content specialistsSaeets Co review' and edit litems 

prepare'd by the test contractor.* A coq^lete item-to-item 

replacement is made for each new test. In early fall these Items 

are field- tea ted with school children in grades four, seven, ten 

and twelve (closest in ability to spring third, sixth,., ninth 'and 

1 ■ . • » ■ • . 

eleventh graders). The committee then reconvenes to review the 

, results pf Che field test. Items are revised or replaced if 

necessary, and the new forms oi the tests are ready for ^ 

administration in the spring. . 

Several, operational features of* the Minimum Basic Skills 

program predetermined the- method for equating future test forms: 

1. A three-week-tumaround for reporting of results to local - 
districts; ' ■ . ^ » 

2. The inability to have a secure equating <iection on each 

, y^ar^s final forrt test, as each x«ar'.s test booklets remain 
in the local districts'; ' . ' , 

3. The use of previous year's items by clay*oom teachers to 
prepare students for the test . • 

Therefore, during the development of trhe second annual Minimum 

Basic Skills tests, it was decided to equat;e the two forms using 

an anchor- test of twenty- five items sel.ec ted from the 1977*1978 

form and included on the field test of the 1978-1979 form. 

Angqff's de.sign IV (-1971, p. 579) was i5erformed by the test 



""^The contractor for the Minimum Basic Skills program during 1977-1978 and 
1978-1979 was Educational Testing Service. 



ERIC. 



contractor (Swln«ford, 1979). 

PURPOSE OF TH£ STUDY , " .• 

The pr«s«nt study was undartakan tb anawar the following 
quastions - * ^ 

1. Could the Rasch model equating methodol6gy be applied to* a 
Binimuii competency, t^t which waa not specifically designed 
to fit the model? 

2. How do the resolt's of test form Equating based on the 

' traditional linear method compare with the results from thi 
Raich method? 



ill 



tHQDS OF EQUATING ' ^ 

Several methods aj-e- available to equate tests. Each provides 
•a way of converting the system of unitd of one form tp the system 
of .units of another so *that scores obtained "after conversion will 
•Se equivalent. This notion of conveigiion implie? two restrictions 
(Angoff . 1971>: . ^ ^ 

a The two forms must be measures of the same characteristic; 

n The conversion must "be unique except for' random . ^rror 
x^'tflsociated with the uAreliability of • the data and the 
method used foV the transformation. 

The two methods which were coi^pared in thl^s stt^dy were the linear 
9,nd Rasch model methods of equating using a common set of anchor 
items. ) ' . ' ' ^ 



m 

V 

The linear method of equating defines two scores as equiva- 
lent if they correspond to equal- standard, score deviates. Th'is 
method is based on the assumption. that the shape9 of, the raw . 
score distributions of two tests afjrtf Identical. The use of 
linear equating with common items (given to two separate non-random 



group* la d«icrib«d by Angoff (X971, Dtaign IV, p. 579)'. Tht 
major aisumptions of this form of linatr aqumting lart:' 

4 

a The t^rtssion systwi for the two group* of studtnts would 
h^v« been identic*! if the two groups had taken the sane 
test: 

.. ■ , . 

n The coanon item set represents the sane psyx^Q^ogica; 
function in both groups; / 

a The two grdi5» do not differ very much in ability (they 
are nqt assumed to be equlvatent) 

The Rasch model, a one parameter latent trait model. 

provides -a' method of equating two tests juslng a common set. of 

anchor items, (Wright, 1977, Beard and Pettie, 197^). This method 

defines twp scores as' equivalent if they coi^respond to the same 

Rasch "log" abili^ty values. These procedures ape based on the* 

estimation of eqtiating constants which .trans fprm the item ♦ 

difficulties and ability estimates from one test on to the scale 

of the base test.- The attractiveness of this method stems from * 

the property of th^s model, labeled "Specific Obj-ectivi^y'* (Rasch 

1^6) , ^. , th^difflculty of the Icenu. ari. estimate'd ind.pani 

deptly of the ''ability of the calibrating sample and the estimates 

of the abilities are independent of th^ particular set of items. 

METflODOLOGY / 

Description of the Tests • . 

Each of the Minimum Basic Skills reading te^ts contains 
item's which canT be categorized into three major cluster 
areas: Word, recognitipn, comprehensiotf ^^nd study skills.- 
• The mathematics test iteins can be broken down into -foto: 
major cluster areas: Computation, measuremexit/geometrYr*^"^ 

t 

I 




number concepts »nd probUm solving. "^Th* number of ifrng 

In aach ttat rangts.from 90-110. - y 

» 

Salaction of th% Pool of. Equating If ma ' 

Twanty-five Itama from each of tha 1977-1978 Minimum ^, 
Basie Skills tests were embedded" into the 19J8- 1979 . field 
test.- Thi^ figure was approximately 22 to 27 percent of 
the total items on the original form. The number of items 
was selected from each of the clustHir areas of the test in 
the same proportion in which they appealed on the original 
form. Other item characteristiid , such as, p- values and 
biserlals were considered when making the selection, in 
order to represent • as- accurately as possible, the parameters 
of the original form. Since items were selected solely, for 
the purpose of traditional- Equating,* no Rasch item statistics 
wtere us^d to select the pool.- The resulting twenty-five 
item sections were truly "Minimum Basic Skills- mini tests" 
in that they mirrored very closely the original form in 
both content and level of difficulty. 

For the purpose of the equating, this, jsection of items 
was placed at the end of the test. While in retrospect 
this was probably not the best place for these it^pms due to' 
such factors as fatigtia, boredoni, etc. , operat^.onal simplicity. ( 
of the field' test took precede^e in this first equating of 
the Minimum Basic Skills tests. 



Description of the Sampled 

Three separate . samples of test data were used to eqiiate 



««ch of ch#. eight Minimum Basic Skills tests. They were: "\ 

* ^1 . 1978«,1979 FIELD TE^T SAMPLE 

This san^le consisted of students in the fourth, 
ieventh, tenth Aid twelfth grades who took one of 
• ^ the tests, either reading or mathematics. Approximately 

400 to 500 students took each of the* eight t^at^. 
' A. systematic staple of twenty- five students was\ 

selected by each participating school. Schools weA 
asked to participate on a volunteer basis, howewr; 
they did have to fit into tt pre-arrang*d stratified 
Stirling matrix of geographic region and socioeconomic ^ 
status. The saB;)ling matrix was designed to provide* 
^ a truly .representative sample t>f the state. 

2. 1977-1978 PINAL FORM SAMP LE 

This sample consisted of a two percent systematic 
saiqyle generated by computer from the total 4p0,000 
students who took the Minimum Basic Skills tests during 
pthe spring of 1978. " 

3. 1978-1979 FINAL FORM SAMPLE ' ^ 

This sample consisted of approximately 1,500 to 2,100 
students who took the Minimum Basic Skills tests in tlie 
spring of 1979. This sample was developed by randomly • , 
taking a selection, pf the earliest returns from the 
districts after the spring administration. Frou^ past 
experience, these districts provided a goo4 indication 
of the performance of the state.' 

t 

Rasch Common Item E<^uatin» - * ' ' 

- Wright (1977) pjresents^ a methodology for equating^ two 
tests by the use of common items. A pait of separate^' and 
indep*(^ent estimates of difficulty "are proc^ced for each 
item that is common to ai pjilr of tests. When| these items 
are calibrated, the ori'gin Jiomlally set by fixing tlfe 
average item difficulty tp zero, which coincidentally , 
fixes. the origin of the al^ility scale. According to the 
Ratfch model, these common items should have the same average 
difficulty for both tests. Any difference in average 
difficulty of the •common item set between th^ two toasts 



lndic«t«a a diff«r«nC« In scale origins of the two tests. 
This difference fn average item difficulty of the common 
item* can used as an equat;ing constant to adjust the 

scale of either test on to the scale of the other. 

/ . • 

The formula for this constant which translates all item 

f ' 

difficulties • from the calibration of test'b on t^ the scale 

• . ■ ' ■ 

of the bafte.tfest a is given by Wright <1977; p. 107). 
where, 

T^j, i5 th« oqi»ating constant to transform the 
Jlifficulties of tost b (into th« ^cale of to*t a. ' , 

<i^^ are die comroon item diffici^lties on test a. l^' 
d^^ are the common item difficulties on« test b.- 
K is the niimber of common items. 
To place any item from test b onto the scale of test 
a one singly adds the constant, i.e./D^^ on scale of a - 
\ T^b' Likewise, to transform the ability table ^from 
the scale of one test. onto the scale of the other the addition 
of the constant; is also m^de. 

Procedures ^ » v 

■For each of the eight tests the three "samples of data 
outlined above were calibrated rfteparately using a version of 
BICAL (Wrigfit ajpid Mead, 1977). 

. Prior to the actiial Rasch eqiiatin^, blvariate plots of 
the item difficulties for the- twenty- five cojmnon items * 
between th# 1978-1979 field test a.nd tjie 1977-1978 final form 



t«jt w«r* mad« and tnalyztd for Itam i^tablllty , , as suggis 
by Rentz (1978) and Baard. and Pattie (1979). In addition, 
slnca these wera tests which were not built using Rasch 
test development procedures, fit to phe model was- examined. 

The actxial equa|j|Lng proceeded in two pljases'*-^ ^ First ; 
using twenty- five common items', the l97a-1979 field test 
was put on tfte scale of the 1977-1978 fam^ by the addition 
of a^ constant to the item difficulties of the fi^ld test 
form. Since there were some changes In items from field to 
final form, a second adjustment or "fine tuning" was' necassiiry 
Using the approximately eighty ta ninety common items between 
the 1978-1979 field test and the 1978-1979 fiiial form, the - 
1978-1979 final form item difficulties /were plac'ed^o^>^e 
scale of the 1977-1978 base form.- 

In 6rder to derive ' equivalent raw scores, the ability 
tables for the 1978-1979 final form were also adjusted 
by. the sitme constant and p^lt on the scale of the 1977-1978 
form. Eiquivalent raw scores were assigned and compared <to 
those from the linear equating method. Special attention 
wair given to those scores at and around the state standard. 



DISCUSSION OF RESULTS 

.. 'In order to answer the first question posed , in the study, vj.z 

J. - . : ' ' 

could the Rasch model equating me'thodology be applied to tests 

^ / • ■ ' • 

which were riot originally develoVed to fit the model,- two kinds 

» - 

of data were analyzed: 

■ . i ' ' • 

1. Mean sqtu^re'fit statistical^ and discrimination indices from 
the BiCAL runs. 



2. Graphic displays of the difficultiy estimatw of' th« / 
twenty- five conmon equating items. 

. / 

Analysts of If m Fit 

. Th« mmaxi squkfd fit- statistics^ lised'W this analysis 
wer^e tha' total maan sqiiare fiJ^^KueTs Trow tha BICAC output. 
Itams with a maan squapa fit </f 1.5 or graatar w^re flagiad 
and considarad of quastjLonabla fit to tha xoo<^. '( 

Tha discrimination index* or slope is, th^ ragrassion of 
•'item log odds" on "test, log odds." Values should be near 
one for fitting items. A value less than one inTlicates that 
the item ^harac^rj^^tic curve for that item is flatter "than 
„the test characteristic curve. Values greater than pne 
indicate that the? item characteristic curve is steeper than 
the test characteristic curve. The 'interval 1.0 ± .20 
<Cartledge, 1975; Rentz and Bashaw, 1975; Beard and Pettie, 

% 

1978) was u^ed as a yardstick by which to -evaluate this type 
of item fit. 

tables 1, 2 and 3 present an analysils of the total mean 
square fit statistics and ^lis crimination indices from the 
calibrS^tions of each of the three samples used for the 
equating. Analyses were made for both total test and common 

/ 

items alone.' In the calibration of the 197,8-1979 field test 
sample (Table 1) the' percent of items with a mean square fit 
less t,han 1,5 ranged from 88 to 9'6 for the total test and 92 
to 100 for the common items. 'The percent of item^ whose 
discrimimAtion index was , within the "recommended 1.0+ .2 
interval ranged from 52 to 75 1 for the total test an£l ^6 to 




84 for th*' comtnon it«ma . 

- In Tablt 2, this samft arialysis, is provided for th« 
calibration ' of cha 1977-1978 final form sampla. In this 
samp|^, ^h« parcant of items with total maih scju^ra fitr 
less than 1.5 ranged between 91 to ItIO for th^ total test 
and betwe^sn 96 and 100 for the common items. Slxty-o 
to Z7 pe^ent of the items on the total test; had slopes 
within the tfc)teptable rangct, while 56 to 92 pei;cent of the 
couajton' items were within tjhis range., , -F"^ 

The analysis of the 1^78-1979 final ^epona- calibration 
sample, given in Table 3, indicates similar findings, vin ^ 
this sample, between 92 and 100 percent of the i^emd on^^ 
total test showed adc^ptable mean square fit statistic^. 
The breakdowns for the common items were. very like those 
for the total test. Similarly, the perce^ti't of items with 
slopes in the acc'ep,table region was 61 to 79 for both 
total test anil! common item pool, . For this sampl^e boeh the 
total test and the copamdn item pool showed the. same propor- \ 
tio^i of items, in the two types of item fit categories, i.e.', 
mean square fit and discrimination index. 

Itt stmanary, the three analyses abbve shfw moderate to 
good fit to .thie model. The percent of fitting items vis k vl, 
the mean square fit statistic was high in all samples. Ot^ 
the other hand; while the, item slopes showed less conformity 
to the jBodel, thej^ercent of items in the acceptable range 
WAS similar if not better thati those reported by Re'nt;^ and » 
Bashaw (1975) in their Rasch analysis of, the Anchor Test 



Study cUta. Rentr and RanCz (1978) atata that tha inolualon 
of a non* fitting Itanja would probably do nd sarioua • - 
V^amkga in taat aqiiatiag applications. 

Common ttam Stability - ., . ; , ' 

Figuraa ona through aight plot tha twanty-fiva comiDon 
Itam difficulty astimataa from tha calibration of tha 1978- 
1979 field teat agains t those from the calibration 'of t;he 
1977-1978 final form for the four reading and four mathematics 
tests, ^cording to the model, the estimates in e^ch pair 
should be statisticflilly equivalent, except for a single 
constant of translation that is the same f or^HM^ems . The^ 
difficulty estimates in these types of plots should be locat?ed 
along a unit slope straight line. The intercept of this line 
^3 equal Co the average difference in item difficulty 

* 

estimates (equating constant) between these two calibrations. 
These plots provide an indication of the relative stability 
of the common item pool. Certainly when etjuating one wants 
a set of items whose calibrations -are stable over time^ to 
be able ^^^^4' ''i:hem to adjust the difficulties of th6 other 
items to a particular scale'. What one looks for then in 
thesa plots are the outliers,' i.e., the items which fall 
away from the unit slope line. « 

Examining the plots in figures one through eight, ther^^ 
were a few items which seemed to fall further *from the unit ' 
slope line then the others. (They have been blackened in.) 
tn one test, sixth grad4 reading (figure 3), fit of the 
items to the line was Vary good and no outliers were detected. " 



Th« wortt c«8« was evident 1^ third- grade mathematics 
(figure \;2), where the items seemed /oo fall All over the 

^ ^lace and seven outliers were observed. ^ ' 

In ^rder to try t?o determine the reasons for these 
items' seeming^ instability, their fit . to the model -was 
an^lyt^d. However, there Appe^tedto be no consistent 
pattern '^df^ explaftat ion. On some tests, every outlier shqwed 

.^acceptable fit to the model (Math 6 and Read 3) using both 
mean squarf fit and slope stktistics.. In others,, a few of 
the ofitrier items; but not all, had st the 

4 - - 

mpdel predicts (dis crimination indices greyer than 1,2). 
In some of the teat%, the^'outliers, had slopes 'which were 
flatter (discrimination indices lesjs than .8).: In most, ^ 
tases; the outli«rs slfowfed these aberrant slope 8t;atistic8 
.for the field test calibration sample only. The field test 
calibration samples were generally .much smallet than the 
final form samples; however, it is unknown whether their 
size affected them. For only two out of the twenty- five 
items detected as outliers (out of 200 common items) , one 
item in Math, 9 and one . item in Math 11 had Ijoth mdan square 
^fit and discrimination indice* out of the range of acceptable 
fit to the model. ' '' - 

In one test. Math 11, there was' also detected a strange 
clump of common items in the lower left comer of the plot 
(figure 8). However, on^loser inspection of both content 
and format of th^^Ttems , no clue waa found as to the meaning 
of this' clump. * ^ ' ♦ , 



.In slunmary,' whir* mo*t of- the" common Itern^ saemcd . 
upf^old t;he model's predictions , .th«*r« were still a few 
unstable items..; Recommendatibna havb been made to delete 
these items from testing applications (Rentz, 1978), as the 
effect of a small difference in the computation of the 
equating constant coulci mean a shift in the resultant 
equating of several raw score points. However, as there 
are really no objective rules for deleting these types of 
items, i.e., how "unstable" is unstable, and because all 

items in the common item pool y^ere used in the linear 
equ^ting^ and one of the purposes of this ^aptier was to 
compare these two n^thod^s, the equating was carried out 
using all the common items. 

! 

Equatinja; Results 

'Table 4 pxesents a^summary of the statistical charac- 
teristics ^^of/ the twenty- five common item sets for each of 

the eight Cest^. th most cases, performance on these items 
'■ r . ^ 

during the previous, spring administration was better than 

on the foll^pwingi.falii field test, although the differences 

are quite small. This may be attributed to the fact that 

these twenty- five items were all placed at the ^nd of the 

^iel|d test fdrm. and such factors as. fatigue, boredom and 

lack of motivation may have interf erred ^ith the students' 

performance op the field test. 

^able 5 presents mean item difficulties standard 

deviations and equating constants for the twenty- five items 

common to the 1978-1979 fiild test and the 1977-1978 gfinal , 



ERIC 



-14- 



fbrm. Th« scalt of ch« 1977-1978 form wa«: chosen as th* 
. baaa forrhra^ tha.stata standards ware amplrically davaloped 
from- thts scale. \ Tha aquating constants wara formed^ by 
subtracting* the flald test maan item difficulty from- the 
final form maan itam difficulty. Jha equating constants 
are the values which. are added to the 1978 field test 
difficulties to put all the items on the field test on to 
the scale, of the base test. 

. Table 6 presents mean item 4if f icultlas , standard ♦ 
deviations and equating constants for ^e common items 
between the 1978-1979 field test,, adjusted to the scale of 
the 1977-1978 test, and the 1978-1979 final form. This ' 
second "fine tuning" adjustment was necessary becaui|e of 
the changes in items from field test to final form in the 
1978-1979 test. The equating constants in this table were 
used to /adjust both the item difficulties and Ability 
estimates from the 1978-1979 final form to the ^scale of the 
197 7-1978 form. . ^ 



Assignment of Raw Scores 

The last step in the equating pnofcess is , the development 

of a table of equivalent raw or. scaled scores. Equivalent 

scored are defined by the Rasch roodel^as those which give 

«> 

rise to the same "Jog" abilpiy estimates (Rentz and Bashaw, 
1975). For the present analysis, adjusted log ability 
estimates from the 1978-1979 final from were matched with 
those from the 1977-1978 final form and equivalent raw 
scores were assigned for edch of the ^eight tests. 



16 



-15- ■ '<i*< 



Tabl« 7 pr«8ttits ;che «quiv«l«nt raw scores d(jyiiy«d / 
from both th« linear^ «nd Rssth methods' o-f equating'. " 
Because ofVthe naturib of these tests, I.e., minlmufit 
competency, the tables are «jp resented for chose raw score 
points at and around the st;«te cut-off , score . It is in • 
this rang%.of the distribution where the results of the 

I- . - > 

equating are most crucial. While a difference ih one or 
two raw score points at the extremes of the distribution 
may^not have any practical consequences, at the cut-scdre 
this could mean a major difference in the niombers of 
students who pass the test. 

The equivalejj^ raw cores derived from the linear 
methqcjji^nd the Rasch method using all twenty- five connnon 
items (nons-edited) show considerable similarity. In boat 
cases the equivalent raw scores given by the two methods 
differ by only one raw score point. In eleventh grade 
reading, the two sets of equivalent raw scores are identical 
• In the Rasch madel, the resultant equivalent raw 



0' 



^'coiles depend very heavily on the accuracy of the estimated 
ant. While the results obtained" using all twenty- five 
Sterns were good, it was deeded to investigate the actual 
impact the editing of unstfifble items might have on the 
resultant Rasch equivalent^ xaw scores. Therefore, (;he. 
equating was reworked with the "unstable" items (represented 
by bl,ac^ dots on figijres one through^ eight) eliminated.' 



^These were derived from analyses prepared by the contractor. , 
'Tables providing the constants fo^ the equating based on the edited 

Item pool are not Included In this paper. They are available upon 

request from the author. 



Table 7 pratanta aquivalantf faw a cor as from this equating* 
based pn the edited initial item pool. \ . \ 

Table 8 provides a comparison of the two aets of Rasph 



•quivaleht raw scorea. Since no items were deleted from 
the sixth grade reading test, only the results from seven 
tests were compared. In six out of the s^Ven tests, the 
equivalent raw scores at the state's cut-off score were » 
different^ In only two of these six tests were, th^ raw 
scores Mm the equating based on the edited item pools ' 
closer to those given by the linear method. Overall, the 
non-edi^te^d item pool generally gave the closest results to 
the, -linear method. In one test, eleventh grade readipg; 
no- differences were observed between, the equivalent scpres • 
arising from the equating based on the edited and non-editepl 
item pools. Colncidently , this was the same test where the 
Rasch and linear equivalent raw scores were identical. 

In summary, the results from this comparison of the 
two methods of equating, viz, , linear and Rasch indicate: 

a) There is a good, though not perfect, match between 
the raw scores 'derived froa the linear and Rasch 
methodologies'; * 

b) The editing of a coa«on item pool for "unstable" 

. items changed the resultant assigned equivalent raw 
scores. However, no findings indicated -that the 
edit ingi provided a "better" table of raW scores, 
IMfortimately, the only criterion of "better" that 
was used in this applicati^on was the match with the 
raw scores defined by the linear method. 

Furth|fr research is needed which uses independent criteria 

by wlyLch to evaluate these two methods. 



-17- , 



^RkFERENCES 




Angoff, W. H. Scales, Norma end Equlvalant Scores, i In R. L 
Thomdike (ed.). Educational Measurement . Washington: 
American Council on Education, 1971. 



Beai^d. ^J. and Pettie, A. A Comparison Linear ancTRasch 
Equating Results for Basic Skills Assessment Tests. Paper 
presents^d at the annual meeting of the American' Educational 
Research^ Association, San Francisco, California, 1979. 

Cartiedge. C. M. A Coyparison of Equipercentile and Rasch 
- Equating Methodolottles . Unpublished doctoral dissertation, 
University of Georgia, 1974. 

Rasch. G. An Item Analysis tha^ takes Individual Differences 
into Account. British Journal of Mathematical ahd Statistical 
Psychology . 1966, 19, 49-57. ^ 

Rentz. R. and Bashaw, W. L. Equating Reading Tests with the 

RAsch Model Volume I . Final Report . .Athens/ GA; University 
of Georgia. Educati^zmal Research Laboratory, .1975. 

• ■ ■ • 

Rentz. R. Monitoring th^ Quality of an It em- Pool CallbrAtld ' 
by the Rasch Model.. Paper presented at the annual meeting 
of the National Council on Measurement in Education, 
Toronto. Ontario, 1978. 

Rentz, R. and Rentz, C. Does the Rasch ^Model really Work? A 
discussion for practitioners. Princeton: ERIC Clearing- 
house on Tests, Measurement and Evaluation, No. 67, 1978. 

Swineford, F. Test Analysis: New Jersey Educational Assessment 
Program, Minimum Basic Skills T^lst, 3BNJ. Princeton: 
Educational Testing Service, SR-79-55, June 1979. ^ 

Wright. B. D. Solving Measurement Problem* with the Reach . 
Model. Journal of Educational Measurement . 1977, 14, 97-116. 



Wright, B. D. and Mead, R. J. BICAL; 
Scales with the Rasch Model. 



CaHl^rating Items and 
:h Mei 



■ Research (^emotanSum No. 23, 

Stetistical Laboratory, Department of Education, University 
of Chicagp, 1977.- 



ERIC 



1^ 



^Tablt I 



1978-1978 Fl«ld T«at Fit and Slope Statistics 
(Size of calibration samples ranged from 401 to 577) 



V 



-•4- 





Mean 


o \ ope 


Total Number 
Of itemti^ 


"iead^ 
Read 3 


88 

too 


69 . 
84 ^ 


100 

26 ■ 


Math 3 

Math 3 , 


91 

92 


71 


100 

25 


Read 6 

Read 6 


95 

too 


52 

S6 


95 

26 


Math 6 • 

Math 6 


95 

96 


62 


TOO 


Read 9 • 
Read 9 


94 • 

100 


/ 


no 

26 


Math 9 

Math 9 


96 


55 


'95 
25 


Read 11 

Read 11 


• 89 
96 


75 


110 

25 


Math lb 
Math. 11 


90 

P5 


71 

72- 


90 

,25 



^Percent of items with total MSF < 1.5 ' ^ 

^Percent of items with discrimi?imtion indices within the 

interval (.8-1.2) ' 

'Values in italics are for the .twenty- five common equating 

item^ , 



20 



-19- 



1977-19 78 Final Form Fit and^§;6pe Stratistics 
(Size of calibration samples ranged irom 1757 to 2133) 



♦ 



Test 


Mean 

sivf ua IB r 1 u . 


? 1 nnA ^ 


' — — — — — — - — t' 

Total Number 

Of items 


Read 3 

Read 3 


95 

too • 


* 83 


f ■ ■ 

^ 75 

26 


Math 3 

Math 3 


95 

100 


84" : 

92 


75 

25 


Read 6 
Raad 6 


100 

zoo 


66 

■ 7S 


70 

26 


Math 6 

Math $ 


96 

too 


61 
64 


75 
25 


V 

Read 9 
Read 9 


95 

too 


74 


85 

25 . 


Ma.th 9 
Math 9 


- 91 
100 


73 

56 


70 

.25 


Read 11 
Read 11 


98 

96 


87 


* — 

85 

25 


Math 11 

ftoth It 


92 * 

96 


76. 
55 


25 - 



^Percent of items, with total MSF < 1.5 

^Percent of items with discrimination indices within the 

interval (.8-1.2) ■ 
'Values in it^alics are for the twenty- fivo conpon equating 

items % 



4 



20* 



TabU 3 ^ 

• 19/8-1979 Final -Form Fit and Slop* Statistics 
(Size of calibraticm saiaples ranged from 1483 to 2110) 





. Mean 




Total Number 


Test 


Square Fit* 


Slope* 


of Items' 


Read 3 


92 


75 


100 


ti0CUl 0 


^ J 


76 


88 


Math 3 


93 


78 


100 




yo 




98 


Read 6 


97 


62 


95 


RBod 6 


98 


61 


83 X 


Math 6 


i 98 


66 


100 


. (4ath 6 




<?5 




Read 9 


100 


62 


110 


RBod 9 


100 




74 


. Math 9 




64 


" 95 


Math 9 


pa ' 


61 - 




Read 11 


' 97 


76 


* 

110 


i?tfad 11 


96 


7P 


82 


Math 11 


93 


66 


90 






63 


76 



^Percejit^of items with total MSP < 1.5 

^Perceat" items with discrimination indices within the 

Jrj^erxal (.8-1.2) ^ 
'vClues In italics are for the common litems between the 

li78-j.979 Field Tiist and the 1978-1979 Final Form 

■( ■ . ". 



2*0-. 



w 

M 



I 

CO ' 



1.0- 



0 " 



-1.0.- 



-2.0 + 



V 



3.0 




1.0 



0 



1.0 



2.0 



3.0 



Vi1l-V^l^ MINIMUM BASIC SKILLS FINAL FORM 



\ 



Plot of the item difficulty estimates for the twenty-five common items used to equate the two forms of the third 
grad# reading test. . , > 



-3.0 



-2.0 



-4- 



4- 



4- 



4- 



-1.0 0 1.0 / 2.0 3.0 

1977-1978 MINIMUM BASIC SKILLS FINAL FORM 



^ grl'Iat^^Jt'^s'les?"' twenty-flve common items used to equate the two forms of the third 



-3.0 



J 



-I- 



-2.0 



-1.0 



3..0 



2.0 



3.0 



1977-1978 . MINIMUM BASI-C SKILLS FINAL FORM' 



FIG. 3: Plot of the Uem,d1fficuUy estimates for the twenty-fivis coirmon items used to equate the two forms Of the sixth 
ErJc grade* reading test, o^ ' ^ : _ *>[^ 



2» 



-3.0 



-2*0 



4- 



4- 



4- 



-1.0 0 1,0 2.0 3.0 

1977-1978 MINIMUM BASIC SKILLS FINAL FORM 



FIG. 4: Plot of the Uem difficulty estimates for the twenty-five common items used to equate the two forms of the sixth 
.ERLC 9**«de mathematics test. O j 



30 



-3.0 



——4 — 

-2.0 



4- 



4. 



4- 



4. 



■4- 



-1.0 0 1.0 2.0 3.0 

. . 1977r 1978 MINIMUM BAS|^ SKILLS PINAL FORM ^ ■ f 

' ■ .t 

cDw^ ^' ''Jot Of the Hem dUfkulty estimates for the twenty-five common Hems used to equate the two forms of the ninth 
tWL grade reading test. 3l . 



^9 



3.0 



H 
CO 

a 
•J 

M 

M 



ON 
I 

CO 



2.0 



L*0 



V 



0 



-2.0. . 



-3.0 




4- 



4- 



4- 



4^ 



4- 



-1.0 0 1.0 2.0 3.0 

1977-1978 MINIMUM BASIC sklLLS FINAL FORM 



o'^IG* 6: Plot of the Uem difficulty estimates for the twertty-five common itenis used to equate the two forms of the ninth* 
ERTC grade mathematics test, q - o : 



\j I ^ j.O 
1977-1978 MINIMUM BASIC SKILLS ^INAL FORM 



•g»G. 7: Plot of the Item difficulty estimates for the twenty-five common items used to equate the two forms o^ the eleventh 
.ERIC ' grade reading test, . 



1.0 



4- 



4. 



1.0 . 0 1.0 2,0 3.0 

,1977-1978 MINIMUM BASip SKILLS FINAL FORM 



FIG. 8. Plot of the Uem difficulty estimates for the twenty-five cohwjon Items used to^equate the two forms of the eleventh 
rn9r> grade mjithematlcs test. 



•A 



-29- 



l 

Tab la 4 

Statistical Characteristics 
of the Tv/anty-Five Conmon l^tams 

f 



Test 


Sample 


N 


Mean 


Vari ance 


'Standard 
Deviation' 




Read 3 ^ 
Kead 3 


F 

a 

A 


577 
1757 


21.67 
21.74 


13.82 
12.76 


3.71 
3.57 


.866 
.868 . 


Math 3 

Math 1 


F 
A 


401 

1 7ft9 

1 /Oc. 


18.82. 

1 Q 07* 


27.17 

90 Oft 


5.21 
4, 79 


.755 
. 759 


Read 6 
Read 6 


F 


405 
1906 


19.61 
20.12 


16.21 
14.83 


4.02 
3.85 


.784 
.804 


l^th 6 
Math 6 


F 
A 


446 
1899 


17.57 
18.29 


24.42 
22.22 • 


4.94 
4.71 


.704 
.667 


Read 9 
Read 9 


F 
A 


408 

2137. 


20.87 
20.84 


18.74 
16.6,4 


'4.32 
4.07.. 


♦ .834 
.834 


Math 9 
Math 9 . 


F 
A 


4eo 

2133' 


.17.32. 
18.44 


26.74 
22.39 


♦ 5.17. 
4.73 


.692 
.735 


Read 11 
Read 11 


F 
A 


.454 
1881 


21.89 
22.21 


12.71 
10.03 


3.56 
3.16 


• 

.876 
.886 


Math 11 
Math 11 


F 
A 


406 
1894 


20.98 
21.30 


.21.06 
13.52 


4.58 
3.67 


.838 
.848 



Legend 

I 'll, ■ ■ I 

F - 1978 Field Tast 

A - 1977-1978 Final Pom 



•r 



* ^ It . 



^TabU_5 



Rasch It«m Difficulty EatimAtas for th« 
Twan-ty-Fiva Coimnon Itama batwaan tha 

1978-1979 Field Test and tha 1977-1978 Pinal Form 







^ Mea^t Iteip 

UI TTICU 1 ty 


Stanza r(t 
UBvidtion 

« ■ a \h 


■^^^ 

Equatliig 

Constant 


Read 3 
Read 3 


, F 
A 


.'173 
.042.; 


1.16 
1.19 


1 31 


Math 3 
Math 3 ' 


F 
A 


» .084 
.016 


.99 « 
A 1.02 


' r;" 


Read 6 
Read 6 


^ 

F 
A 


.204 
.187 


. 1J2 
1.12 . 


-.017 


Math 6 
Math 6 . 


• F 
A 


-.113 
-.054 


1.01 
1.07 


.059 


Read 9 
Read 9 


F 
A 


.153 
.049 


.91 
.88 


-.t04 


Math 9 
Math 9 


F 
A 


.170 
.220 


.1.04 
1.03 


———-mi 

.050 


Read 11 
Read 11 


F 
A 


.o4i . 

.094 


1.09 ^ 
1.22 


.003 


Math 11 
Math 11 


F. 
A 


-.189 
-.312 


.82 
.98 


-.123 



Legand 



F - 


1978- 


1979 


Field. 


Test 


A - 


1977- 


1978 


Final 


Form . 



40 

4 



-31- 



Table 6 



f 




Rasch Item IXLfficulty Estimates for the Conmon Items 
between th% 1978-1979 Field Test and, 1978-1979 Final Form 







* 

Mean Item \ 

• UirTiCUlty 


Standard 
Deviation 

========r===±=====c 


Equating 
Constant 


Number 
of Items 


Read 3 
Read 3 


A 

AF • 


-.083 
-.291 


1.24 
1.28 


AAA 

-•208 


88 
88 


Math 3 
Math 3 


A 
AF 


.ore 

-.073 


1.05. 
1.03. 




* 

98 
98 


Read 6 
Read 6 


• A 
AF 


-.039 
-.135 


1.30 
1.34 


-.096 


83 

83 


.Math 6 
Math 6 


A 
AF 


-.090 
-.050 


1,03 


.046 


94 
94- 


Read 9 
Re ad. 9 


A 
AF 


• ' - . 002 
-^180 


1.03 
.1.00 


-.178 


74 

74. 


Matb 9 
Math 9 


- A 

A^: 


-.034* 
-.070 


1.19 ' 
1.15 


-.036 


85 
85 


Read 11 
Read 11 


A 
AF 


^48 

—Jpia 


1.17 
1.19 


-.020 


82 ' 
82 


Math 11 
Math 11 


■ A 
AF 


-\p51 
-i|92 


1.15 
1.12 


-.141 ' 


76 
76 



Legend 



AF - Adjusted 1978-1979 Field Test 
A - 1978-1979 Final Form 



ERIC 



.32- 

A Comptrifon of th« Riiulti of 
Llntar and RaicH EquACina for S«X«cC«d Raw, Score Intarval 

Naar tha Stata Cut-Off Scoraa 



THIRD GRADE R'EADING 


Raw Scori 


Linear 


/ /Rasch* 


on 




Non- Edited EdUed 


83 1 84 


70 


o£ 


82 . 83 


70 

/o 


81 


81 82 


77 


80 


80 


81 


76* 


79 


79 80 


*75 


79 - 


78 79 


74 


78 . 


77 78 




77 


76 


77 


72 


76 


- 75 76 • 


71 


» 75 


'74 75 


70 


74 


73 75 




THIRD QjIIaDE mathematics 




Llj^ar 


Rasch* 


?o^ 


72 


Non-Edited Edited 


72 1 • 71 ' 






71 7Q 


^8 


79 


70 $.9 


^ 67 


70 




66 


69 


68 . . 67 ■ 


*65 




^ 67 ' . -66 


64 


67 


66 . 65 


63 


66 


65 • 64 


62 


65 


64 63 


61 


64 


63 ' 62 


60 


63 


62 -.61 



r 



The results of the Rasc|i equating are being reported for both the non 
and edited item pools. 

♦Denotes the cut-score on the 1977- 7« form of the test. 



-33- 



TabU 7 (Cont.) 



( 




SIXTH GRADE READING 


Raw Scort 


LinMr 




77 


78 




1 

77 . • 


76 


77 


76 




76 


75 


74 


75 


74 


73 


' 75 


73 1 


*72 


74 


72-73 


71 


73 


72 1 


70 


72 


71 


69 


71 


70 1 


68 


70-71 


69 


67 


70 


68 


1 > • • 

r 


SIXTH GRADE MATHEMAYICS 


Raw Score 


Linear 


Rasch* 


70 


69 


Non- Edited Edited 


69 1 70 


69 


68 


69 1 69 


68 


67 


68 68 


67 


66 


67 67 




65 


65-66 1 ^ 66 


. *65 


64 


64-65 ' 65 


64 


. 63 


63 64 


' . 63 


62 


62 1 63 


62 


61 ^ 


61 - 1 62 ' 


61 


60 


60 1 61 


60 


59 


59 • 60 



isults of the Rasch equating are being jlporte4 for both the non-edited 
'and edited item pools, 
►re were no item edited for this test., > 

^Denotes the cut-score^on the 1977-78 fori of the test. 

la . V . 



1 



ERIC 



-34- 

TabU 7 (Cent.) 



NINTH GRADE READINte^ 


Raw Score> 


Linear 


Rasch^ 


88 


91 


Non-Edited 1 Edited 


90 1 92 


87 


90 


90 91 


V 86 


90 


88-89 ' 90 


ACf 

85 


89 


88 ' 89 


84 


88 


- 87 1 88 


*83 


87 


86 87 


82 


86 


85 86 


81 


85 


84 85 


80 - , 


84 


83 ' ' 84 


79 


84 


82 84 


78 


83 


81 1 83 




NINTH GRADE MATHEMATICS' 


Raw Score 


Linear 


Rasch* 


67 


67 . 


Non- Edited Edited 


68 1 69 


66 


• 66 


67 1 68 


65 


65 


66 67 . 


64 


64 


65 1 66 


63 




64 ' 65 ' 


*62 




• 63 1 64 


61 


61 , 


62 63 


60 


60 


61 62 


59 


59 


60 1 -61 




58 


59 1 60 ' 


57 


57 


58 ' 59 



ERIC 



*The results' of "the Rasch equating are being reported for both the' non-edite^, 
and edited item pools. 

^Denotes the cut-»core on the 1977-78 form of the test. 

^ t 



-35- 

Tablt 7 (Cont.) 



ELEVENTH 6RA(f READING 


Raw $cor9 


^ineer 


Rasch^ 

V Mil ^^11 


88 . 


' 89 


Non- Edited 1 Edfied' ~ 


89 1 89' 


87 


88 


88 88' 


86 


.87 


* 87 1 87 


85 


« 86 


86 86 


84 


85 


^85 1 ' 85 


*83 


84 


84 , 84* 


82 


- 83 


83 83 


81 


82 


82 I 82 


.80 


81 , 


81 1 81 


79 


/80 


80 80 


7« 


79 


79 79 ■ 


1 


ELEVENTH GRADE MATHEMATICS 


Raw Score 


Linear 


Rasch * 


64 


67 


Non- Edited Edited 


67 1 66 


63 


66 


66 .65 


62 


65 


65 1 64,. 


61 


64 


64 1 » 63 . 


60 


63 


63 1 62 1 


.*59 


a 


62 61 


— ' ; « 

58 


62 


61 60 


57 


61 


60 1 59 


56 


60 


59 1 58 


55 


59 


58 57 


54 


59 


57 56 



^The results of the Rasbh equating are being reported for both the non-edited 
and edited item pools^ 

♦Denotes the cut-score on the 1977-78" fdnn of the test. 



ERIC 



45 



Table 8 



A Comparison of the Rasch Equating Constants from the 
Edited and Non-Edited Twenty-Five Conmon Item Poqls 





Read 3 


Hath 3 


: Read 6* 


Hath 6 


Read 9 


Math 9 


Read U 


Nath 11 


fk)n-ed1ted Item pool const^^nt 


-.^31 


-.068 


/ -.017 


.059 


-.104 


.050 


.003 


'.123 


Edited Item pool constant 


-.215 


-.035 




.024 


-.192 


.004 


^ -.016 


-.047 


Difference In constant 


.084 


-.033 


• 


.035 


.088 


.046 


—r^ 

.019 


-.076 


Approximate differences In 
log ability values between raw 
score points at cut-score 


.06-. 07 


.05-. 06 


• 


.06 


T . - - - - 

.06 


.06 


..0d/.07 


Or 

.06-. 07 


Does the Rasch equating with 
the edited Item pool change 
the equivalent raw scores at 
cut-point? 


Yes 


Yes 


■ t 


Slightly 


Yes 


Yes 


Mo I 


r 

Yes 


Is' this change closer to the 
value given by the linear 
results? 


Yes 


f 

NO 


• 


. No 


Yes 


No 


■ N/A 


No*^ 


Which Rasch equating Is closer 
to the linear results? 


Edited 


Non- 
tdlted 




Non- 
Edited 


^ Edited 


Non- 
Edited 


Both 
Same' 


Non- 
Edited 



'No items wore edited from sixth grade reading. 

ERIC 



• 4 



r 



* 



